135 research outputs found
Multi-Engine Approach for Named Entity Recognition in Bengali
PACLIC / The University of the Philippines Visayas Cebu College Cebu City, Philippines / November 20-22, 200
Elevating Code-mixed Text Handling through Auditory Information of Words
With the growing popularity of code-mixed data, there is an increasing need
for better handling of this type of data, which poses a number of challenges,
such as dealing with spelling variations, multiple languages, different
scripts, and a lack of resources. Current language models face difficulty in
effectively handling code-mixed data as they primarily focus on the semantic
representation of words and ignore the auditory phonetic features. This leads
to difficulties in handling spelling variations in code-mixed text. In this
paper, we propose an effective approach for creating language models for
handling code-mixed textual data using auditory information of words from
SOUNDEX. Our approach includes a pre-training step based on
masked-language-modelling, which includes SOUNDEX representations (SAMLM) and a
new method of providing input data to the pre-trained model. Through
experimentation on various code-mixed datasets (of different languages) for
sentiment, offensive and aggression classification tasks, we establish that our
novel language modeling approach (SAMLM) results in improved robustness towards
adversarial attacks on code-mixed classification tasks. Additionally, our SAMLM
based approach also results in better classification results over the popular
baselines for code-mixed tasks. We use the explainability technique, SHAP
(SHapley Additive exPlanations) to explain how the auditory features
incorporated through SAMLM assist the model to handle the code-mixed text
effectively and increase robustness against adversarial attacks
\footnote{Source code has been made available on
\url{https://github.com/20118/DefenseWithPhonetics},
\url{https://www.iitp.ac.in/~ai-nlp-ml/resources.html\#Phonetics}}.Comment: Accepted to EMNLP 202
- …